MoveNetMultiPoseLighting
A convolutional neural network model that runs on RGB images and predicts human joint locations of people in the image frame. The main differentiator between this MoveNet.MultiPose and its precedent, MoveNet.SinglePose model, is that this model is able to detect multiple people in the image frame at the same time while still achieving real-time speed.
(edges are available in org.jetbrains.kotlinx.dl.onnx.inference.posedetection.edgeKeyPointsPairs and keypoints are in org.jetbrains.kotlinx.dl.onnx.inference.posedetection.keyPoints).
The ``predictRaw``
method returns a float32 tensor of shape (1, 6, 56).
The first dimension is the batch dimension, which is always equal to 1.
The second dimension corresponds to the maximum number of instance detections.
The model can detect up to 6 people in the image frame simultaneously.
The third dimension represents the predicted bounding box/keypoint locations and scores.
The first 17 * 3 elements are the keypoint locations and scores in the format:
``[y_0, x_0, s_0, y_1, x_1, s_1, …, y_16, x_16, s_16]``
, where y_i, x_i, s_i are the yx-coordinates (normalized to image frame, e.g. range in``[0.0, 1.0]``
) and confidence scores of the i-th joint correspondingly.The order of the 17 keypoint joints is:
``[nose, left eye, right eye, left ear, right ear, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left hip, right hip, left knee, right knee, left ankle, right ankle]``
.The remaining 5 elements
``[ymin, xmin, ymax, xmax, score]``
represent the region of the bounding box (in normalized coordinates) and the confidence score of the instance.
@see Detailed description of MoveNet architecture in TensorFlow blog. @see TensorFlow Model Hub with the MoveNetLighting model converted to ONNX.